# Replication Package
"One Threshold Doesn't Fit All: Tailoring Machine Learning Predictions of Consumer Default for Lower-Income Areas"
Authors: Vitaly Meursault, Daniel Moulton, Larry Santucci, and Nathan Schor
Journal: Journal of Policy Analysis and Management (JPAM)

## Data Availability

### Primary Data
The data that support the findings of this study are from the Federal Reserve Bank of New York Consumer Credit Panel/Equifax data (CCP).  Access to the CCP microdata is limited to Federal Reserve System researchers, and their coauthors, due to contractual limitations. Individual analyses also use the Federal Reserve Board’s Capital Assessments and Stress Testing (Y-14M) report and the HMDA-McDash-CRISM dataset: a combination of Home Mortgage Disclosure Act (HMDA) data, Black Knight McDash (McDash) data, and Equifax Credit Risk Insight Servicing data, that is linked to the McDash data (CRISM). Access to these datasets is also restricted. Note that certain variable names have been removed per contractual limitations on sharing data content or attributes. 

### Supplementary Data
The following supplementary data are included in this package:
- Rolling window specifications
- FIPS codes
- Community Reinvestment Act Low and Moderate Income area designations

## Code Structure

The replication code is organized in three directories:

1. `01_data_prep/`
   - Pulls CCP data from servers
   - Filters data
   - Creates outcome variables for ML pipeline
   - Note: Confidential CCP variable names removed per Equifax contract

2. `02_ml_pipeline/`
   - Trains machine learning models
   - Collects predictions
   - Generates model performance figures over time

3. `03_thresholds_analysis/`
   - Generates decision thresholds
   - Creates remaining paper results

## Execution
All scripts can be executed through the main script `_00_run_all.R`, which sequentially calls all other scripts in the proper order.


